On-line String Matching in Highly Similar DNA Sequences
نویسندگان
چکیده
We consider the problem of on-line exact string matching of a pattern in a set of highly similar sequences. This can be useful in cases where indexing the sequences is not feasible. We present a preliminary study by restricting the problem for a specific case where we adapt the classical Morris-Pratt algorithm to consider borders with errors. We give an original algorithm for computing borders at Hamming distance 1. We exhibit experimental results showing that our algorithm is much faster than searching for the pattern in each sequences with a very fast on-line exact string matching algorithm.
منابع مشابه
A Fast Algorithm for Approximate String Matching on Gene Sequences
Approximate string matching is a fundamental and challenging problem in computer science, for which a fast algorithm is highly demanded in many applications including text processing and DNA sequence analysis. In this paper, we present a fast algorithm for approximate string matching, called FAAST. It aims at solving a popular variant of the approximate string matching problem, the k-mismatch p...
متن کاملAn Index based Pattern Matching using Multithreading
Pattern matching, the problem of finding sub sequences within a long sequence is essential for many applications such as information retrieval, disease analysis, structural and functional analysis, logic programming, theorem-proving, term rewriting and DNA-computing. In computational biology the essential components for DNA applications is the exact string matching algorithms. Many databases li...
متن کاملComparison of sequence alignment algorithms
The fact that biological sequences can be represented as strings belonging to a finite alphabet (A, C, G, and T for DNA) plays an important role in connecting biology to computer science. String representation allows researchers to apply various string comparison techniques available in computer science. As a result, various applications have been developed that facilitate the task of sequence ...
متن کاملAll - Against - All Sequence
In this paper we present an algorithm which attempts to align pairs of subsequences from a database of DNA sequences. The algorithm simulates the classical dynamic programming alignment algorithm over a digital index of the database. The running time of the algorithm is subquadratic on average with respect to the database size. A similar algorithm solves the approximate string matching problem ...
متن کاملBlastGraph: intensive approximate pattern matching in string graphs and de-Bruijn graphs
Many de novo assembly tools have been created these last few years to assemble short reads generated by high throughput sequencing platforms. The core of almost all these assemblers is a string graph data structure that links reads together. This motivates our work: BlastGraph, a new algorithm performing intensive approximate string matching between a set of query sequences and a string graph. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Mathematics in Computer Science
دوره 11 شماره
صفحات -
تاریخ انتشار 2014